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Abstract — This paper studies the ergodic capacity of time- 
and frequency-selective multipath fading channels in the ul- 
trawideband (UWB) regime when training signals are used 
for channel estimation at the receiver. Motivated by recent 
measurement results on UWB channels, we propose a model 
for sparse multipath channels. A key implication of sparsity is 
that the independent degrees of freedom (DoF) in the channel 
scale sub-linearly with the signal space dimension (product of 
signaling duration and bandwidth). Sparsity is captured by 
the number of resolvable paths in delay and Doppler. Our 
analysis is based on a training and communication scheme 
that employs signaling over orthogonal short-time Fourier (STF) 
basis functions. STF signaling naturally relates sparsity in delay- 
Doppler to coherence in time-frequency. We study the impact 
of multipath sparsity on two fundamental metrics of spectral 
efficiency in the wideband/low-SNR limit introduced by Verdu: 
first- and second-order optimality conditions. Recent results by 
Zheng et. al. have underscored the large gap in spectral efficiency 
between coherent and non-coherent extremes and the importance 
of channel learning in bridging the gap. Building on these 
results, our results lead to the following implications of multipath 
sparsity: 1) The coherence requirements are shared in both time 
and frequency, thereby significantly relaxing the required scaling 
in coherence time with SNR; 2) Sparse multipath channels are 
asymptotically coherent — for a given but large bandwidth, the 
channel can be learned perfectly and the coherence requirements 
for first- and second-order optimality met through sufficiently 
large signaling duration; and 3) The requirement of peaky 
signals in attaining capacity is eliminated or relaxed in sparse 
environments. 



I. Introduction 

Emerging applications of ultrawideband (UWB) radio tech- 
nology have inspired both academic and industrial research 
on wide-ranging problems. The large bandwidth of UWB 
systems results in fundamentally new channel characteristics 
as evident from recent measurement campaigns [1], [2], [3]. 
This is due to the fact that, analogous to radar, wideband 
waveforms enable multipath resolution in delay at a much 
finer scale - delay resolution increases in direct proportion 
to bandwidth. From a communication-theoretic perspective, 
the number of resolvable multipath components reflects the 
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number of independent degrees of freedom (DoF) in the 
channel [4], [5], which in turn governs fundamental limits 
on performance. When the channel coefficients corresponding 
to resolvable multipath are perfectly known at the receiver 
(coherent regime), the DoF reflect the level of delay-Doppler 
diversity afforded by the channel [4], [6]. On the other hand, 
when the channel coefficients are unknown at the receiver 
(non-coherent regime), the DoF reflect the level of uncertainty 
in the channel. The fundamental limits to communication, such 
as capacity, can be radically different in the coherent and non- 
coherent extremes, and communication schemes that explicitly 
or implicitly learn the channel can bridge the gap between the 
extremes [7], [8]. 

In this paper, we study the ergodic capacity of time- and 
frequency-selective UWB channels in the non-coherent regime 
where the channel is explicitly estimated at the receiver using 
training signals. Motivated by recent measurement results, 
our focus is on channels that exhibit sparse multipath - the 
number of DoF in the channel scale sub-linearly with the 
signal space dimension (product of signaling duration and 
bandwidth) - in contrast to the widely prevalent assump- 
tion of rich multipath in which the number of DoF scale 
linearly with signal space dimension. Whether a multipath 
channel is rich or sparse depends on the operating frequency, 
bandwidth and the scattering environment [1]. For example, 
[2] reports rich channels even for 7.5 GHz bandwidth in 
industrial environments whereas [3] reports sparse multipath 
in residential environments at the same bandwidth. Overall, 
large bandwidths increase the likelihood of channel sparsity 
[1], [9]. In time-selective scenarios, the likelihood of sparsity 
is increased further due to multipath resolution in Doppler. 

The results in this paper build on two recent works that ex- 
plore ergodic capacity of fading channels in the wideband/low- 
SNR regime [7], [8]. The seminal work in [7] shows that 
spectral efficiency in the wideband regime is captured by 
two fundamental metrics: -f^ , the minimum energy per 
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bit for reliable communication, and So, the wideband slope. 



A signaling scheme that achieves 



is termed first-order 



optimal and one that achieves Sq as well is termed second- 
order optimal. The results of [7] also show that knowledge of 
channel state information (CSI) at the receiver imposes a sharp 
cut-off on the achievability of ergodic capacity in the wideband 
regime. In particular, while QPSK signaling is second-order 
optimal when perfect CSI is available (coherent regime), 
flash (peaky) signaling is necessary for first-order optimality 
when no CSI is available (non-coherent regime). However, a 
flash signaling scheme, besides having an unbounded peak-to- 
average ratio (and hence practically infeasible), also results in 
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So = and thereby violating second-order optimality. 

This apparent sharp cut-off in the peak-to-average ratio of 
capacity achieving signaling schemes between the coherent 
and non-coherent extremes was examined in [8]. If the co- 
herence time of the channel scales at a sufficiently fast rate 
with the bandwidth, Zheng et al. show that a communication 
scheme with explicit training can bridge the gap between the 
two extremes. However, no physical justification is provided 
for the existence of such a scaling in coherence time with 
bandwidth. In other related work, [10] investigates the effect 
of channel uncertainty when using spread-spectrum signals. 
They conclude that the number of resolvable channel paths 
need to scale sub-linearly with bandwidth in order to achieve 
the wideband limit (first-order optimality in [7]). 

We first propose a model for sparse multipath channels 
to capture the physical channel characteristics in the UWB 
regime as observed in recent measurement studies. In a time- 
and frequency-selective environment, multipath components 
can be resolved in delay and Doppler where the resolution 
in delay/Doppler increases with signaling bandwidth/duration 
[5]. A key implication of multipath sparsity is that the number 
of DoF in the channel (resolvable delay-Doppler channel co- 
efficients) scales sub-linearly with the signal space dimension. 
Our analysis of the ergodic capacity of doubly-selective UWB 
channels is based on signaling over short-time Fourier (STF) 
basis functions [11], [6] that are a generalization of OFDM 
signaling and serve as approximate eigenfunctions for under- 
spread channels. Furthermore, STF signaling naturally relates 
multipath sparsity in delay-Doppler to coherence or correlation 
in time and frequency [6]. We consider a communication 
scheme in which explicit training symbols are used to estimate 
the channel at the receiver. The capacity of this scheme is 
then studied to investigate the impact of multipath sparsity on 
achieving coherent capacity. 

The results of this paper lead to several new contributions 
and insights on the impact of sparsity. First, we show that 
multipath sparsity provides a natural physical mechanism 
for scaling of coherence time, T co h, with bandwidth/SNR, 
as assumed in [8]. Second, the coherence requirements for 
achieving capacity are shared between both time and fre- 
quency: the coherence bandwidth, W co h, increases with band- 
width, W (due to sparsity in delay), and the coherence time, 
T co h, increases with signaling duration T (due to sparsity 
in Doppler). As a result, the scaling requirements on T co h 
with W (or SNR = P/W, where P is the total transmit 
power) needed in [8] for first- and second-order optimality 
are replaced by scaling requirements on the time-frequency 
coherence dimension N co h = T co hW co h- This leads to sig- 
nificantly relaxed requirements on T co h scaling with band- 
width/SNR compared to those in [8]. Third, we show that 
sparse multipath channels are asymptotically coherent; that is, 
for a sufficiently large but fixed bandwidth, the conditions for 
first- and second-order optimality can be achieved simply by 
making the signaling duration sufficiently large. We quantify 
the required (power-law) scaling in T with W for first- and 
second-order optimality as a function of channel sparsity. This 
asymptotic coherence of sparse channels is also manifested 
in the performance of the training scheme - consistent chan- 



nel estimation is achieved with vanishing fraction of energy 
expended on training. The asymptotic coherence of sparse 
channels also eliminates/relaxes the need for peaky signaling 
that has been emphasized in existing results [12], [7] on 
non-coherent capacity, implicitly based on a rich multipath 
assumption. We discuss how sparsity and peakiness can be 
traded off suitably depending on system design requirements. 
Finally, the results in this paper are shown to hold in general, 
independent of the type of scaling laws used to model sparsity. 

The paper is organized as follows. The system setup, includ- 
ing the sparse channel model and training-based STF signaling 
scheme, is described in Section UT1 In Section [Till we study the 
ergodic capacity of sparse channels with perfect CSI and for 
the training-based communication scheme. A discussion of the 
results, including their relation to existing work is provided 
in Section [TV] Numerical results are provided to illustrate the 
implications of the theoretical results. Concluding remarks and 
directions for future work are discussed in Section [V] 

II. System Setup 

In this section, we first propose a model for sparse multipath 
channels in terms of the number of paths that are resolvable 
in delay and Doppler. We then develop a system model based 
on orthogonal short-time Fourier (STF) signaling and propose 
a block fading channel model that naturally relates multipath 
sparsity in delay-Doppler to coherence in time-frequency. We 
then describe the training-based communication scheme in the 
STF domain whose capacity is investigated in this paper. 

A. Sparse Multipath Channel Modeling 

We consider a single-user single-antenna communication 
system in complex baseband 

T Wd 

y(t)=[ m f 2 h(T,y)x{t-T)e> 2 ' KVt 6v6r + w(t) (1) 
Jo J-?*? 

where the channel is characterized by the delay-Doppler 
spreading function, h(r, v), and x(t), y(t) and w(t) represent 
the transmitted, received and additive white Gaussian noise 
(AWGN) waveforms, respectively. T m and Wd represent the 
delay and Doppler spreads of the channel. We assume an 
underspread channel, T m Wd <C 1, which is valid for most 
radio channels. A physical discrete multipath channel can be 
modeled as 

h(r,v) = y^/3»(5(T - T n )8(v - v n ) 

n 

V(t) = ^Mt-Tjef^t + wit) (2) 

n 

where /?„, r„ G [0, T m ] and v n G [-W d /2, W d /2] denote the 
complex path gain, delay and Doppler shift associated with the 
n-th path. Note that the above model assumes that the carrier 
frequency is much larger than the signaling bandwidth so 
that the effects of motion are accurately captured via Doppler 
shifts (the shrinking or dilation of the signaling waveforms is 
ignored). 

The physical model ©, while accurate, is complex to 
analyze from a communication-theoretic perspective due to 
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the non-linear dependence on propagation parameters r„ and 
v n . We instead use a linear virtual representation [5], [4] for 
time- and frequency-selective multipath channels that captures 
the channel characteristics in terms of resolvable paths and 
greatly facilitates analysis. Throughout the paper, we consider 
signaling over a duration T and (two-sided) bandwidth W. 
The virtual representation, illustrated in Fig. [Tf a X uniformly 
samples the multipath in delay and Doppler at a resolution 
commensurate with W and T, respectively [5], [4] 

L M 

= E E h i>m x(t-£/W)e^ mt l T (3) 

1=0 m=-M 

where L = \T m W\ M = \TW d /2\, S T , t = {n : i/W - 
1/2W < r n < i/W + 1/2W} denotes the set of all paths 
whose delays lie within the delay resolution bin of width 
At = l/W centered around the £-th resolvable (virtual) 
delay, f = l/W, and S v>m = {n : m/T - 1/2T < v n < 
m/T+l/2T} denotes the set of all paths whose Doppler shifts 
lie within the Doppler resolution bin of width Av = 1/T 
centered around the m-th resolvable (virtual) Doppler shift, 
v m — m/T. The sampled representation (01 is linear and is 
characterized by the virtual delay-Doppler channel coefficients 
{ht >m }. The expression states that the channel coefficient 
he >m consists of the sum of gains of all paths whose delays 
and Doppler shifts lie within the (I, m)-th delay-Doppler 
resolution bin of width At x Av centered around the sampling 
point (f, P m ) — (i/W, m/T) in the (t,v) (delay-Doppler) 
space. It follows that distinct h^ m 's correspond to approxi- 
mate^ disjoint subsets of paths and are hence approximately 
statistically independent (due to independent path phases). 
This approximation gets more accurate with increasing T and 
W, due to higher delay-Doppler resolution, and we assume 
that the channel coefficients {he, m } are perfectly independent. 
We also assume Rayleigh fading in which {hi t7n \ are zero- 
mean Gaussian random variables^ Thus, for Rayleigh fading, 
the channel statistics are characterized by the power in the 
virtual channel coefficients 

*(£,m) = E[\h e , m \ 2 ]n E [\Pn\% ( 5 ) 

We define dominant non-zero channel coefficients, he_ m 's, 
as those which contribute significant channel power; that is, 
the coefficients for which $?(£, m) > 7 for some prescribed 
threshold 7 > 00 In Fig. |TJa), the delay-Doppler resolution 
bins with a dot in them represent the dominant channel coeffi- 
cients. Let D denote the number of dominant non-zero channel 
coefficients; that is, D = \{(l,m) : ^(i,m) > The 
parameter D reflects the (dominant) statistically independent 
degrees of freedom (DoF) in the channel and also signifies the 
delay-Doppler diversity afforded by the channel. Furthermore, 

'Approximate due to finite T and W . 

2 This would be true if, for example, there are sufficiently large number of 
unresolvable paths contributing to each hi m in @. 

3 The choice of the threshold 7 depends on the operating SNR and 
discussion of the choice of this threshold is beyond the scope of this paper. 



we decompose D as D — DtDw where Dt denotes the 
Doppler/time diversity and D\y the frequency/delay diversity. 
The channel DoF or delay-Doppler diversity is bounded as: 

D = DtDw < -Dmax = ^max^max 

£> T)maX = \ TW d\ , %max = \T m W\ (6) 

where Dt max denotes the maximum number of resolvable 
paths in Doppler (maximum Doppler/time diversity) and 
Dw,max denotes maximum number of resolvable paths in 
delay (maximum delay/frequency diversity). Note that Dr,max 
and Dw,max increase linearly with T and W, respectively. 
D = D max represents a rich multipath environment in which 
each resolution bin in Fig. [Tta) corresponds to a dominant 
channel coefficient. 

However, from recent measurement campaigns [1], [13], 
[14] for UWB channels, there is growing experimental evi- 
dence that the dominant channel coefficients get sparser in 
delay as the bandwidth increases. Most existing measurement 
results are for indoor UWB channels and do not consider the 
effect of Doppler. We are interested in modeling scenarios with 
Doppler effects, as well, due to motion. In such cases, as we 
consider large bandwidths and/or long signaling durations, the 
resolution of paths in both delay and Doppler domains gets 
finer, leading to the scenario in Fig. [Tfa) where the delay- 
Doppler resolution bins are sparsely populated with paths, i.e. 

D < Anax- 

We formally model multipath sparsity with a sub-linear 
scaling in Dt and Dw with T and W: 

D T ~ (TW d ) Sl , D w ~ (T m W) S2 , 6 u 6 2 e [0, 1] (7) 

where the smaller the value of Si, the slower (sparser) the 
growth in the resolvable paths in the corresponding domain. 
Note that this directly implies that the total number of delay- 
Doppler DoF, D = DtDw, scales sub-linearly with the 
number of signal space dimensions N = TW. 

Remark 1: We focus on the power-law scaling in © as 
a concrete example for studying the impact of sparsity on 
capacity. As discussed in Sec. IIV-FI the results of this paper 
hold true for arbitrary sub-linear scaling laws. 

Remark 2: With perfect CSI at the receiver, the parame- 
ter D denotes the delay-Doppler diversity afforded by the 
channel, whereas with no CSI, it reflects the level of channel 
uncertainty; the number of channel parameters that need to be 
estimated for coherent processing at the receiver. 

B. Orthogonal Short-Time Fourier Signaling 

We consider signaling using an orthonormal short-time 
Fourier (STF) basis [6], [11] that is a natural generalization of 
orthogonal frequency-division multiplexing (OFDM) for time- 
varying channels^ An orthogonal STF basis for the signal 
space is generated from a fixed prototype waveform g(t) via 
time and frequency shifts: (f>i m (t) = g(t — iT Q )e : ' 2vWot , where 
T W = 1, £ = 0, • • • , N T - 1, m = 0, • • • , N w - 1 and 

4 STF signaling can be considered as OFDM signaling over a block of 
OFDM symbol periods and with an appropriately chosen OFDM symbol 
duration. 
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Fig. I. (a) Delay-doppler sampling commensurate with signaling duration and bandwidth, (b) Time-frequency coherence subspaces in STF signaling, (c) 
Illustration of the training-based communication scheme in the STF domain. One dimension in each coherence subspace (dark squares) represents the training 
dimension and the remaining dimensions are used for communication. 



N = N T N W = TW with N T = T/T„, N w = W/W Q . The 
transmitted signal can be represented as 



N T - 1 N w - 1 



X imhm(t) 0<t<T (8) 



=0 m=0 



where {xi m } represent the N transmitted symbols that are 
modulated onto the STF basis waveforms. At the receiver, the 
received signal is projected onto the STF basis waveforms to 
yield the received symbols 



Din 



(V,(f>em) = h £m/m' x e'm' + w im- (9) 



We can represent the system using an TV-dimensional matrix 
equation 

y = VSNR Hx + w (10) 

where w represents the additive noise vector whose entries 
are i.i.d. CAf(Q, 1). The N x N matrix consists of the channel 
coefficients {h lm e ' m '} in ([9j. The parameter SNR represents 
the transmit energy per modulated symbol and for a given 
transmit power P equals S N R = = ^ . In this work, our 
focus is on the UWB regime, where SNR— > as W oo 
for a fixed P. 

For sufficiently underspread channels, the parameters T and 
W Q can be matched to T m and so that the STF basis 
waveforms serve as approximate eigenfunctions of the channel 
[11], [6]; that is, (0 simplifies tcQ?/^ m w h em xt m +we m . Thus 
the N x N channel matrix H is approximately diagonal. In 
this work, we assume that H is exactly diagonal; that is, 



H = diag hn ■ ■ ■ h 1Ncl 

Subspace 1 



h 2 i ■ ■ ■ h 2 N c 
Subspace 2 



Subspace D 



(ii) 

The diagonal entries of H in (fTTT i also admit an intuitive 
block fading interpretation in terms of time-frequency coher- 
ence subspaces [6] illustrated in Fig. |TJb). The signal space 
is partitioned as N — TW ~ N C D where D represents the 
number of statistically independent time-frequency coherence 

5 The STF channel coefficients are different from the delay-Doppler coeffi- 
cients, even though we are using the same symbols. 



subspaces, reflecting the DoF in the channel or the delay- 
Doppler diversity (see ©), and N c represents the dimension of 
each coherence subspace, which we refer to as the coherence 
dimension. In the block fading model in (fTTI) . the channel 
coefficients over the i-th coherence subspace hn , ■ ■ ■ , hiff c are 
assumed to be identical, {hi}, whereas the coefficients across 
different coherence subspaces are independent. Furthermore, 
due to the stationarity of the channel statistics across time and 
frequency, the different hi are identically distributed. Thus, 
the D distinct STF channel coefficients, {hi}, corresponding 
to the D independent coherence subspaces, are i.i.d. zero-mean 
Gaussian random variables (Rayleigh fading). The variance of 
each channel coefficient is equal to E[|/ii| 2 ] = J2 n E[|/3n| 2 ] 
which we normalize to unity [6]. 

Using the DoF scaling for sparse channels in (0, the 
coherence dimension can be computed as 

W , X * 

= W 2 IT 2 

L>w 

T co hW co h 



(12) 
(13) 



Nr. 



Si 



> 



Tn 



1 



T m Wd 



(14) 



where T co h is the coherence time and W co h is the coherence 
bandwidth of the channel, as illustrated in Fig. |TJb). Note 
that 6i = 62 = 1 corresponds to a rich multipath channel 
in which N c = l/(T m Wd) is constant and D — D max 
increases linearly with N = TW. This is the assumption 
prevalent in existing works. In contrast, for sparse channels, 
(5i,52) £ (0,1), and both N c and D increase sub-linearly 
with N. 

The coherence dimension plays a key role in our analysis. 
In terms of channel parameters, N c increases with decreasing 
TmWd as well as with smaller <5j. In terms of signaling 
parameters, N c can be increased by increasing T and/or W. 
On the other hand, when the channel is rich, N c depends only 
on T m Wd and does not scale with T or W . 

Using JTJI i, we note that 

jyi— ft pl—5 2 



(15) 
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and thus W co h naturally scales with SNR. Using ( fT3T >, the 
expression for N c in ( TBl i becomes 



N r = 



rpl—Sl 



3l-<5 2 



{W d ) s i (T m ) s *SNR 



1-ffa 



(16) 



Our focus is on computing the sparse channel capacity and 
as we will see later in Section [TTTJ capacity turns out to be 
a function only of the parameters N c and SNR. Furthermore, 
the following relation between iV c and SNR = P/W plays a 
key role in our analysis 



N r = 



1 



SNR P 



H> 



(17) 



where the parameter fi reflects the level of channel coherence. 
Equating dPTl ) with ( TToT ) leads to the following canonical 
relationship 



T = 



w 



(18) 



that relates the signaling parameters (T,W,P), as a function 
of the channel parameters, in order to satisfy ( TTTb - Equations 
(flTt and ( fT8l are the two key equations that capture the impact 
of sparsity and we will revisit them in Section [TV] We next 
describe the training-based communication scheme in the STF 
domain that serves as the workhorse of the capacity analysis 
in this paper. 

C. Training-Based Communication Using STF Signaling 

Our interest is primarily in the non-coherent scenario when 
there is no CSI at the receiver a priori. We focus on a com- 
munication scheme in which the transmitted signals include 
training symbols to enable channel estimation and coherent 
detection. Although it is argued in [8], [15] that training- 
based schemes are sub-optimal from a capacity point of view, 
the restriction to training schemes is motivated by practical 
considerations. We assume that both the transmitter and the 
receiver have knowledge of channel statistics (values of T m , 
Wd, Si and 62 in our model). 

We now describe the training-based communication scheme, 
adapted from [8] to STF signaling. The total energy available 
for training and communication is PT, of which a fraction rj is 
used for training and the remaining fraction (1 — 77) is used for 
communication. Since the quality of the channel estimate over 
one coherence subspace depends only on the training energy 
and not on the number of training symbols [15], our scheme 
uses one signal space dimension in each coherence subspace 
for training and the remaining (N c — 1) for communication, as 
illustrated in Fig. [TJc)- We consider minimum mean squared 
error (MMSE) channel estimation and the two metrics that 
capture channel estimation performance are (i) 77, the fraction 
of energy used for estimation, and (ii) MSE, the mean squared 
error in estimating each channel coefficient. 

The training energy to estimate the channel coefficient in 
one coherence subspace is given by 

VTP (a) 



Et, 



D 



t/7V c SNR 



(19) 



where (a) follows from the fact that SNR = and N C D 



TW. Recall that N = N C D = TW = N T N W and 
D = DtDw. Similarly we partition N c = iV c ,T-^c,w where 
N c ,t = Nt/Dt is the temporal coherence dimension and 
N c ,w = Nw / Dw is the spectral coherence dimension and 
represent the number of STF basis functions that lie within 
T co h and W co h, respectively (see Fig. Q2 C ))- The following 
equations describe training in the STF system: 

Vim = \J E tr ht m Xi m + W£ m , 

l=(i- 1)N C , T + 1 , m = (j - 1)N C , W + 1, 
i = l,---,D T , j = l,---,D w (20) 

where {xg m } are the D training symbols (with \x£ m \ 2 = 1) 
known at the receiver that are used to estimate the D channel 
coefficients {hi m } with E[|/i£. m | 2 ] = 1, 

The communication energy per transmitted data symbol is 



n—rjYTP _ (l-i))jV e SNR 



given by E cm - {Nc _ 1)D - {N ^ 1} 
component of the system can be described by 



The communication 



m 



— y/E cm h t ' m 'X e ' m ' +w l ' m ', 
= (i- l)N c<T + 2, • • ■ , iN c . T 
= {j - l)N CtW + 2, • • • , jN CtW , 
i = 1, • • • ,D T , j = !,• • • ,D W 



(21) 



where {x e > m >} now represent the (N c — \)D communication 
symbols with E[|x £ ' m '| 2 ] = 1. We can rewrite (f2TT > as 



= VE. 



cm n l'm' X l'm 



\J E cm A e > 



l' m' X C 



+ Wo 



(22) 



and hgi m i is the MMSE estimate of h e ' m > and is given by 

'Etr 



"'I'm' ~ 



1 + E f . 



ye'm' X l' r 



and A £' m ' = h'm' - Vtr. 
resulting MSE is given by 

MSE(t7,7V c ,SNR) = 



is the error in the estimate. The 



E[|A,' m '| 2 
1 



Vm'l 2 



1 



(23) 



1 + Etr l+r?iV c SNR' 

We are now ready to compute the ergodic capacity of the 
training-based communication system. 

III. Ergodic Capacity of the Training-Based 
Communication Scheme 

We first characterize the coherent capacity of the wideband 
channel with perfect CSI at the receiver which serves as a 
benchmark. The coherent capacity per dimension (in bps/Hz) 
is 



C co/l (SNR) 



sup 

Q: Tr(Q) < TP 



E[log 2 det (l NcD +HQH H )] 



N r D 



(24) 

where P denotes transmit power and H is the diagonal, block- 
fading channel matrix in (Q~TJ|. The optimization is over the set 
of positive semi-definite transmit covariance matrices Q. Due 
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to the diagonal nature of H, the optimal Q is also diagonal. 
In particular, the uniform power allocation Q = -S^^-N e D = 
SNR In c d achieves capacity and 



C co/l (SNR) 



EtiE[log 2 (l 



TP 

N C D 



(a) 



E 



D 

log 2 (l + SNR|/i|* 



(25) 



where (a) follows since {hi} are i.i.d. with h representing a 
generic random variable, N C D = TW and SNR = S?. 

The next proposition provides upper and lower bounds to 
the coherent capacity in the low SNR regime. 

Proposition 1: For all b £ (0, 1) and SNR = ^ such that 
SNR < ^-jp-, the coherent capacity satisfies 

C co/l (SNR) > log 2 (e) (SNR - SNR 2 ) 
C coh (SNR) < log 2 (e)(sNR-^-SNR 2 ^. (26) 

Moreover the capacity converges to the lowerbound as SNR — > 
0. 

Proof: See Appendix lAl ■ 
The lowerbound in Proposition Q] shows that the minimum 
energy per bit for reliable communication is given by M}>- = 

iv o min 

log e (2) and the wideband slope Sq = 1, the two fundamental 
metrics defined in [7]. 

We now define the notion of an operational coherence level 
[8] that allows an alternative, but equivalent, characterization 
of capacity in the wideband/low-SNR regime. 

Definition 1: Let It r be the average mutual information 
achievable with a training-based communication scheme. We 
say that the scheme achieves an operational coherence level of 
e (0 < e < 1) if the low SNR asymptote of I tr is of the form 
SNR - O (SNR 1+e ). Note that the two values of e = and 
e = 1 correspond to the first-order and second-order optimality 
conditions, respectively, as defined in [7]. ■ 

In the scaling law, N c = SNRP , p > in ( TT7| >. the 
parameter p reflects the coherence achieved by the training- 
based communication scheme. We are interested in computing 
the value of p such that the training-based scheme achieves an 
operational coherence level of e. This relation is characterized 
in Theorem 1 . We start with the following lemma that provides 
a lower bound to the capacity of the training-based scheme. 

Lemma 1: The capacity of the training -based communica- 
tion scheme described in Sec. IH-CI is lower bounded by 

Itriv, No, SNR) > T tr (rj, N c , SNR) 4 i log 2 (l + 2(3a 2 ) (27) 
where 



/3fa,iV c ,SNR) 



(1 — t ? ) (1+7? jVcSNR) jV c SNR 
[(JV -l)(l+>7 jV c SNR)+(l-77) JV C SNR] 



(28) 

(29) 



^,JV e> SNR) = 1 ^jgg R . 

Proof: See Appendix IB] ■ 
Next, we optimize over the fraction of energy spent for 
training, rj, to maximize the lower bound I tr - Thus, we 
explicitly highlight the role of rj in the following lemma. 
Lemma 2: The rj that maximizes It r (rj) given in (|27| > 
= where K{rj) = K = (3a 2 and (3 and 



satisfies 



dKM _ 



a 2 are as in (f28b and (T29T >. respectively. The optimizing value 
rj* and the corresponding K* are given by 



V 

K* 



jV e SNR+jV e -l 
(A' C -2)JV C SNR 



jV r SNR+Af e -l 
(N c -2) 2 



1 



jV, ; SNR(jV c -2) _ -, 
W C SNR+7V C -1 L 



1 jV e SNR(iV e -2) _ -, 



(30) 



■(31) 



7V C SNR+Af c -1 

Furthermore, the optimized (tightest) lower bound is given by 

Itr fa*) = (l - j^j \ ■ log 2 (1 + 2K*) . (32) 

Proof: See Appendix ICl ■ 
We now state the main result of this work. The following 

theorem characterizes the required scaling of N c (value of p) 

so that any operational coherence level e can be achieved. 
Theorem 1: The average mutual information of the 

training-based scheme achieves an operational coherence level 

e e [0, 1] 

Itr > log 2 (e) • [SNR - O (SNR 1+e )] (33) 

1 for /i > 1 + 2e. More precisely, if 
2e = 1, then 



if and only if iV c 

e G [0, 1) and N c 



hr > log 2 (e) • [SNR - 2 SNR 1+e + o(SNR 1+e )] . (34) 



If e = 1 and A^ c = SNRi) , then 



Itr > log 2 (e) • [SNR - 3 SNR 2 + o(SNR 2 )] 



If e = 1 and N r = 



SNRC A* > 1 

Itr > log 2 (e) • [SNR - SNR 2 + o(SNR 2 )] 



2e = 3, then 

32 , /cmd2\ 



(35) 



(36) 



In particular, the first- and second-order optimality conditions 
(corresponding to e = and e = 1) are met if and only if 
ji > 1 and ji > 3, respectively. 

Proof: See Appendix iDl ■ 
Theorem Q] and equation ( fT8b are key to understanding the 
impact of sparsity on achieving coherent capacity in the UWB 
regime. This is discussed in the next section. 

IV. Discussion of Results 

A. The Coherence Dimension: Sharing Coherence Costs in 
Time and Frequency 

Multipath sparsity provides a natural mechanism for channel 
coherence and our results underscore the impact of sparsity in 
both delay and Doppler via the notion of the time-frequency 
coherence dimension, N c . As discussed in Section IH-AI in 
sparse channels, D\y and W co h increase sub-linearly with W. 
Furthermore, unlike existing works, we explicitly account for 
Doppler diversity - Dt and T co h increase sub-linearly with T 
- since STF signaling involves coding over multiple coherence 
times. 

Theorem Q] shows that the requirement on T co h in [8] is 
now the requirement on time-frequency coherence dimension 
N c — T co hW co h- Thus, the coherence cost is shared in both 
time and frequency and as a result the required scaling for 
T co h can be significantly weakened by taking advantage of 
the natural scaling of Wcoh with W. If the delay diversity is 
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known to scale as D w = O (W S2 ) <h+ W coh = O (W 1 ' 52 



then the T co h scaling requirement reduces to 

T coh = N c /W coh = O (W 2e+S2 



(37) 



to achieve an operational coherence level e, as per Definition 
Q] For example, using e = 0.5, which corresponds to a sub- 
linear term of SNR 1 ' 5 in ([33j, and S 2 = 0.5, we get T coh = 
0(W 1 ' 5 ). This is a less stringent scaling law than would be 
required using the framework of [8], where the requirement 
would be T coh = O (W 1+2e ) = O (W 2 ). The weaker T coh 
requirement for sparse channels is graphically illustrated in 
Fig. |2a) for the following parameters: T m — 10 -5 sees., 
Wd = 50 Hz, W — 50 MHz. Note that as the channel 
becomes more sparse in delay (decreasing 82), W co h gets 
larger, thereby reducing the T co h requirement to achieve any 
desired operational coherence e. 




B. Asymptotic Coherence of Sparse Channels 

Since channel uncertainty is the main factor that affects 
capacity in the non-coherent scenario, we further investigate 
the performance of channel estimation using two metrics: 
(i) MSE of channel estimates and (ii) optimal fraction of 
total energy used for estimation, r/*. The following theorem 
characterizes the value of /i for asymptotically energy-efficient 
and consistent estimation. 

Theorem 2: In the limit of large signal space dimension 
(T,W-+ 00) 

1 



and MSE = 



if and only if N c 



SNR" 



and 



1 + Etr 
/l > 1. 







(38) 



Furthermore, the rates of convergence are given by 
1 



as O 



= SNR 



00 as O (ViVcSNR) = O [ SNR" 



= O W~ 



= O W~ 



(39) 



Proof: See Appendix [E] ■ 
The above result says that multipath wireless channels are 
asymptotically coherent if and only if they are sparse and N c 
satisfies the condition (p > 1) specified in Theorem|2] For rich 
multipath, N c is a constant (N c — T 1 w - d ) and does not scale 
with SNR. For a sparse channel with p < 1, N c does not scale 
at a fast enough rate with SNR. Under both scenarios, as shown 
in the proof of the theorem, the training scheme asymptotically 
uses half the total energy (77* — > 0.5) to estimate the channel 
coefficients and the MSE does not decay to zero. For fi= 1, 
the estimation performance is better than when p < 1, but 
still not good enough to obtain asymptotic coherence. These 
observations are illustrated in Fig. [2jb) where 77* and MSE 
are plotted as a function of increasing bandwidth for three 
different cases: p = 0.7, p = 1 and /1 = 1.3. In all the three 
cases, the signaling duration T is chosen according to ( fT8l . 

Note that the requirement (p > 1) for asymptotic coherence 
in Theorem |2] is exactly the same as the condition to achieve 
first-order optimality in Theorem[T] This makes intuitive sense: 
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Fig. 2. (a) The variation of Tcoh and W co h as a function of delay sparsity 
(62). (b) MSE and rj* for the channel estimation scheme as a function ofW 
for three different values of /1. 



with diminishing channel uncertainty (MSE — > 0) and a 
vanishing fraction of the energy (77* — > 0) used for estimation, 
the capacity of the training-based system converges to coherent 
capacity in the wideband limit. 



C. Optimal Choice of Signaling Parameters 

Recall the discussion in Section III-Bl in particular equation 
(TT8l that relates the signaling parameters (T,W,P) for achiev- 
ing a desired scaling of N c with SNR in (I17l l. We now revisit 
this relationship, in light of Theorem [TJ and investigate the 
choice of signaling parameters in order to obtain a desired level 
of operational coherence e (in particular, the values for first- 
and second-order optimality, e = and e = 1, respectively). 

Theorem Q] states that to achieve an operational coherence 
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e, the coherence dimension must scale as 
1 



: 1 GHz, 



: 40 dB 



10 , 8 2 = 0.4 



N r = 



fi > 1 + 2e 



(40) 



SNR M 

and by taking the logarithm of ( TT8l > we note that the signaling 
duration T must scale with W as a function of P and the 
channel sparsity parameters as 

1 



log CO = 



l-* x 



log 



d m 



So -I 



l-6i 



1-6,. 

log(P). 



log (WO 



(41) 



1 



For example, with T m W d = 10~ 6 , ^ = 30 dB, W 
GHz and a sparsity of 6\ = 62 = 0.5, the required minimum 
signaling duration to obtain first-order optimality (e = 0, /1 > 
1) is T « 1 ms. 

Note from (ETT i that smaller Si's imply a slower scaling of 
T with W. Conversely, for a given T and W, (|4"TT > can be used 
to determine the effective value of fj, in d40b as 



fJ-cS 



(l-5 1 )Iog(T/c) + (l-5 2 )Iog(P) 



log(W/P) 



+ (!-&) (42) 



where c = fe^W^J 1 . The effective operational coher- 
ence level can then be determined as e e gf = Alcf 2 ~ 1 ■ 

Note that fi c g — > 00 as T — > 00 for sparse channels, 
which implies that any operational level of coherence can be 
achieved by simply increasing T. This is due to multipath 
sparsity in Doppler. This is illustrated in Fig. [5J where we 
consider the low SNR asymptote of the coherent capacity in 
d26i >. The coefficients of the first- and second-order terms are 
Ai = log 2 (e) and A 2 = — log 2 (e), respectively. In Fig. [3] 
we plot the numerically estimated values c\ and c 2 of Ai 
and A 2 , respectively, for the training-based scheme, which 
are estimated using Monte-Carlo simulations and using the 
optimized lower bound on I tr in d32l . For a large enough T 
such that /i ff > 1, the first-order constant ci — > Ai = log 2 (e). 
Also shown in the figure is the behavior of the second- 
order constant and for an even larger value of T, we obtain 
c 2 — > A2 = - log 2 (e), when fi c g > 3. 

D. Peaky versus Non-Peaky Signaling 

Several works have emphasized the necessity of signaling 
schemes that are peaky in time and/or frequency for achieving 
wideband capacity in the non-coherent regime [12], [16], [7]. 
The motivation behind peaky signaling is that communication 
takes place over a smaller set of signaling dimensions, thereby 
reducing the effect of channel uncertainty since fewer channel 
parameters need to be estimated. However, peaky signaling 
is practically infeasible due to peak power constraints. More 
importantly, the requirement of peakiness in these works is 
tied with the implicit assumption of rich multipath. 

When the channel is sparse, the coherence dimension N c 
naturally scales with the signal space dimension (N = TW) 
and this new effect raises the following question: Is peaky 
signaling still necessary to achieve capacity in the wideband 
limit? Theorem [TJ provides the answer: as long as \i > 1, non- 
peaky i.i.d. Gaussian input signals are first-order optimal and 
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Fig. 3. Numerically estimated values of capacity metrics. Convergence of the 
coefficients of the SNR and SNR 2 terms in capacity as a function ofT. 



with /i > 3, second-order optimality is also satisfied. While 
the authors in [8, Lemma 2] (using a non-peaky training-based 
communication scheme) obtained exactly the same conditions 
on /j,, their results are for the scaling of T co h, whereas our 
scaling result in Theorem Q] is for N c = T co hW co h- In order 
to weaken the T co h requirement, the authors in [8, Lemma 
3] advocate the use of peaky training and communication. 
Furthermore, the capacity-optimal scheme according to [8, 
Theorem 4] is a peaky non-coherent communication scheme 
in which no explicit training is performed. Next, we present a 
detailed discussion on the scaling laws of T as a function 
of W to achieve a desired level of operational coherence. 
To illustrate the impact of sparsity, we compare the scaling 
requirements in this paper with those in [8]. 

From ( HTt . we note that to achieve an operational coherence 
level of e, T must scale with W as 



sparse 



OC W 1=*I 



(43) 



where the subscript on T emphasizes that it applies to sparse 
channels. On the other hand, the corresponding scaling on T 
for either the peaky or the non-peaky training-based commu- 
nication scheme in [8, Lemma 2 and 3], can be inferred as 



T rich cx SNR" (1+2£) cx W 1+2e 



(44) 



This is because when there is no peakiness, then the minimum 
signaling duration is T = T coh cx SNR~ (1+2e) . When 
peaky training and communication is used, T = L ■ T co h oc 
[SNR"" 1 ] • [SNR~ 3e ] = SNR~ (1+2e) . 

Thus, d43l yields a slower (less stringent) scaling than ( f44b 
when 



2e + S 2 



<l + 2e<=>-(l + 2e)8 1 +6 2 <l. (45) 
82) plane represented in d45l ) 



The locus of points in the (Si 
defines the set of channel sparsity values for which we obtain 
a slower scaling requirement. This is pictorially represented in 
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Locus of points in the (5 8 ) - plane to describe T vs. W scaling 
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Fig. 4. (a) Regions in the (8i , 82 ) plane comparing the required T vs. W scaling in the NP-TS and P-TS schemes. Points to the left of the 81 + 82 = 1 line 
represent the favorable region for first-order optimality (e = 0) of NP-TS, illustrated in (c); for points to the right of this line, P-TS yields more favorable scaling, 
illustrated in (d). Points to the left of the 3<5i + 82 < 1 line represent the favorable region for second-order optimality (e = 1) of NP-TS, illustrated in (b). (b)-(d): 
T vs. W scaling comparison for the two schemes for different levels of sparsity. (b) High sparsity: 81 =0.1 and 82 = 0.3. (c) Medium sparsity: 81 = 0.3 and 
82 = 0.4. (d) Low sparsity: 81 = 0.8 and 82 = 0.9. 



Fig. @ta) for the special cases of e = (first-order optimality) 
and e = 1 (second-order optimality). 

Figs. Iljb)-(d) illustrate the required scaling of T with W 
for different levels of channel sparsity. In all figures, the non- 
peaky training-based scheme in our framework is denoted by 
NP-TS, whereas the peaky training scheme in [8] is denoted 
by P-TS. The signaling duration requirements for P-TS are 
independent of channel sparsity and are given by 

Tp-ts.jocW , T p - ts . 2 ocW 3 (46) 

where the subscripts "1" and "2" reflect the requirements 
for first- and second-order optimality, respectively. Fig. |4jb) 
compares the scaling requirements for the sparsest channel: 
Si =0.1 and S 2 = 0.3 so that 35i + S 2 < 1. In this case, the 
scaling requirements for NP-TS are: 

T np -ts,i oc W 1/3 < W , T„ p _ ts ,2 oc W 2 - 3/0 - 9 < W 3 (47) 
which are less stringent that d46b for both first- and second- 



order optimality. Fig. |4fb) corresponds to a medium sparse 
channel: Si — 0.3 and S 2 — 0.4. In this case, the scaling 
requirements for NP-TS are 

T np . tsA oc W 0A '^ < W , T„ p _ ts ,2 oc W 2A /°' 7 > W 3 

(48) 

which are less stringent than ( |46b for first-order optimality but 
more stringent for second-order optimality. Fig.Uc) represents 
the least sparse channel: r5i = 0.8 and S 2 — 0.9 so that Si + 
S 2 > 1. In this case, the scaling requirements for NP-TS are 

T np -ts,x oc W™' - 2 > W , T np _ ts , 2 oc W 2 ^ - 2 > W 3 

(49) 

which are more stringent than d46l ) for both first- and second- 
order optimality. 

E. Rich versus Sparse Multipath: The Extreme Cases 

We now discuss the two extreme scenarios of rich and sparse 
multipath, i.e, 8% — ► or 1, i = 1,2. The canonical scaling 
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relationship in ([T8l between T and W (ignoring constants) is then, using ( TSBT ) and (l33l ). we have 



T cx 



(50) 



As either 6\ or $2 or both tend to zero, we have a very sparse 
channel in which any desired value of /i can be obtained with 
relatively small values of T by following d50i >. 

When 82 ~ * 1, the conditions on T in d50l > grow more 
stringent in order to attain a desired /1. When <5 2 = 1, W co h 
is a constant and the requirements on N c can be attained 
only through T co h scaling with increasing T. In particular, 
the conditions on T in d50ll become 



(51) 



T ozW 1 



As 8\ — > 1, the conditions on T to attain a desired fi become 
more stringent. When 61 = 1, we have a constant T co h and 
from a scaling perspective, iV c = W co h oc W 1 ' 



-62 



SNR 1 



Thus the attained value of fi is fi = 1 — <5 2 < 1, and even 
first-order optimality cannot be obtained. 

This issue can be resolved by considering peaky signaling 
schemes, that also help offset the large T requirements when 
8\ and/or £2 is close to 1. We model peaky signaling by 
assuming that a subset of the time-frequency coherence sub- 
spaces in each codeword (Fig. |TJb)) are used for training and 
communication and no information is sent in the remaining 
subspaces. We model peakiness similar to [8] and define 



C = SNR 7 , 7 >0 



(52) 



as the fraction of signal space dimensions which are used for 
communication. The effect of peakiness is captured through 
the parameter 7. More specifically, the peakiness ratio (PR) 
between peaky and non-peaky signaling given by PR = 
= SNR" 7 -> 00 as SNR -> since 7 > 0. It is clear 
that 7 < 1, since the energy per transmit symbol equals 



SNR 



SNR 
SNR 7 



SNR 1 " 7 



(53) 



and SNR > 1 when 7 > 1 and we are no longer in the 
low SNR regime. The following result captures the impact of 
peakiness on the average mutual information of the training- 
based scheme. 

Proposition 2: The peaky training-based scheme achieves 



if r (SNR) > log 2 (e) 



SNR - O SNR" 



(54) 



if N c = 1/SNR^ 7 . 

Proof: The average mutual information with a peaky 
input equals 

i£.(SNR) = C itr(SNR) = SNR 7 J tP (SNR') (55) 

where J tr (SNR ) is the average mutual information achiev- 
able with the non-peaky scheme, as in (|33l l of Theorem Q] 
Therefore, if 



Nr. 



1 



1 



SNR 



(SNR') " 



7f r (SNR) 



>log 2 (e)-SNR 7 



(a) 



log 2 (e)-SNR 7 

log 2 (e) 



SNR -O SNR 



SNR 1 " 7 -O (SNR i±i ^ 1 



SNR-OfSNR 1 ^ 



(56) 



where (a) follows from d53l l. This proves the proposition. ■ 
Thus the advantage of using a peaky input manifests itself 
in reducing the required SNR exponent of the coherence 
dimension, N c . That is, the effective fi reduces to /i pca ky = 
fi — 7. Using the result of Proposition [2] we now revisit 
the scaling law in ( fT8l . As a consequence of the condition 
N c = 1/SNR M 1 , we obtain a slower (relaxed) scaling of T 
as a function of W to achieve a desired value of fi 

T cx W . (57) 

For any < 61,62 < 1, the rate at which T scales with 
W can now be controlled through the peakiness parameter 7, 
especially when 8{ — > 1. More importantly, when Si — 1, we 
have N c = W ca h = SNR i_j 2 and therefore we can satisfy the 
condition N c = 1/SNR M " 7 as long as 



7 > /1 + 62 - 1 • 



(58) 



Note that while we can obtain first-order optimality in this 
case, and necessarily through peaky signaling, second-order 
optimality is not feasible since it requires 7 > (2 + 62) > 
1. When $2 = 1, peakiness is not necessary, but the scaling 
requirements on T can be relaxed from ( TSTb to 



Tex W 1 



(59) 



F. Arbitrary Sub-linear Scaling Laws 

We modeled sparsity in delay and Doppler by restricting our 
attention to the power-law scaling in (0. We now show that 
the results in this paper hold true for any sub-linear scaling 
in the DoF. Since sparsity in delay /Doppler implies that W co h 
and T co h scale (sub-linearly) with W and T respectively, we 
assume a general scaling law for these quantities. Let 

W coh = fi (W) , T coh = f 2 (T) (60) 

=> N c = T coh W coh = f 1 (W)U(T) (61) 

where f 1 and f 2 are strictly increasing, arbitrary sub-linear 
functions of W and T respectively. That is, f 1 (W) ~ o(W) 
and f 2 {T) ~ o(T). Note that the definition in d60l > implies 
that D w = - o(W) and D T = = o{T). We 

also assume 

T = f 3 (W) (62) 



where f 3 reflects the scaling of T with W, necessary to obtain 
a desired value of [i. Given f t and / 2 , our focus here is to 
find a suitable / 3 so that a desired value of /i can be obtained. 

A key observation from Theorem Q] is that it provides 
necessary and sufficient conditions for first- and second-order 
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optimality that are independent of the power-law scaling 
assumptions in (0. Recall that with N c = S[ ^ Rfl , the condition 
for first-order optimality is fi > 1 and that for second- 
order optimality is fi > 3. Defining a new parameter = 
7V C SNR = SNR 1 ^, which has the physical interpretation of 
the transmit energy per DoF, we have in the limit of SNR — > 0, 



Ed — > oo as O 



SNR"- 



with /i > 1 and /i > 3 for first- and 



second-order optimality, respectively. Using ([Pil l and d62b . we 
have 



£ d = 7V c SNR = A(W)/ 2 (T)SNR 

= A(W0/ a (/ 8 (W))SNR 
= A(^) ffl (W^)SNR 



(63) 



where we have defined <?i(x) = (/ 2 o f 3 )(x). We also provide 
the following definition that is used in the subsequent theorem. 



Definition 2: For any two functions / and g, we define 



f(x) ~ w(g(x)) 



lim 

x — >oo 



fix) 



50) 



(64) 



Theorem 3: For the coherence scaling laws in (|60l > and joTt , 
a necessary and sufficient condition to obtain a desired value 
of fi is given by f 1 (x)g 1 (x) ~ w{x^). 

Proof: Using d63l and noting that SNR = we have 



iVcSNR = A(s^) 9l (s^)SNR 



/i (a:) 5i 0) 



Therefore, to obtain a specific //, we require 



SNR^ 1 / a; 



= (a;"- 1 ) 
(65) 



Note that the conditions for first- and second-order optimality 

are f 1 (x)g 1 (x) ~ w(x) and f 1 (x)g 1 (x) ~ w(a; 3 ), respec- 
tively. 

Corollary 1: For given / x and / 2 , the conditions of Theo- 
rem[3]are satisfied by choosing f 3 (x) = / 2 _1 

Remark 3: The conditions of Theorem [3] are satisfied under 
the power-law scaling assumptions in ( fT4b and the T vs. W 
scaling relationship in (TT~8T >. We have = a; 1-52 , / 2 (.t) = 

' /3W — x 1-51 anc l ^ follows that f 1 {x)g- L (x) = x^. 



l-<$i 



.r 



G. Comments on Channel Modeling 

A couple of comments on the channel model used in this 
paper are warranted. First, the block fading channel model in 
the STF domain used in this paper is an idealization of the 
effects of multipath sparsity in delay-Doppler. The idealized 
model was used to facilitate capacity analysis by relating 
the sub-linear scaling in the channel DoF in delay-Doppler 
to the scaling in the time-frequency coherence dimension 
under STF signaling. While the actual channel in the STF 
domain would exhibit more complex characteristics, the block 
fading idealization does capture the essence of multipath 
sparsity from the viewpoint of DoF scaling, which is the most 



important channel property in the context of channel capacity 
in the limit of large signal space dimension. 

Second, throughout this work, we assume a simplistic Gaus- 
sian model for small-scale fading. However, evidence from 
measurement campaigns suggests "specular" statistics for the 
channel coefficients and some channel measurements [1], [13] 
indicate that Nakagami or log-normal distributions may be a 
more accurate fit for the small-scale fading in the wideband 
regime. While this issue is not addressed in this paper, our 
assumption of Gaussian statistics permits closed-form analysis 
and we suspect that the implications of multipath sparsity 
would hold under such statistics as well. 

V. Conclusions 

We have investigated the ergodic capacity of sparse multi- 
path channels in the ultrawideband regime. Motivated by re- 
cent measurement campaigns, we have introduced a model for 
sparse multipath channels that captures the effect of multipath 
sparsity on the statistically independent DoF in the channel 
via the notion of resolvable paths in delay and Doppler. The 
workhorse of our analysis is the use of orthogonal STF sig- 
naling that approximately diagonalizes underspread channels 
and naturally relates multipath sparsity in delay-Doppler to 
coherence in time and frequency. In particular, we proposed 
a simple block-fading model for sparse channels in the STF 
domain that captures the sub-linear scaling of the channel DoF 
with signal space dimensions. 

Our work builds on recent results on ergodic capacity in 
the wideband regime to study the impact of multipath spar- 
sity on bridging the gap between coherent and non-coherent 
regimes. The most significant implication of multipath sparsity 
is that the requirements on coherence time, T co h, in existing 
works [8] are naturally replaced by requirements on the 
time-frequency coherence dimension, N c = T co hW co h- As 
a result the requirements on channel coherence are shared 
between time and frequency thereby leading to significantly 
reduced coherence time requirements to attain a desired level 
of coherence. Our results reveal how any desired operational 
coherence can be achieved by scaling the signaling parameters 
- signaling duration T, bandwidth W and transmit power P - 
in an appropriate fashion. We also discussed the usefulness of 
peaky signaling schemes for reducing coherence requirements 
and the role played by channel sparsity in relaxing peakiness 
requirements. 

There are many interesting directions for future work. First, 
it would be useful to refine the results in this paper via more 
accurate modeling of sparsity in the time-frequency domain 
(as opposed to the block fading model). Second, studying 
the impact of non-Gaussian statistics of channel coefficients 
would also be useful. Third, while ergodic capacity is achieved 
by coding over long signaling durations, in practical settings 
with strict delay constraints, it is important to investigate more 
relevant metrics, like outage capacity [17]. An important and 
related performance metric is reliability (in terms of error 
exponents) [18]. We are currently investigating the impact of 
multipath sparsity on outage capacity and reliability. In this 
context, we recently reported a new fundamental learnability 
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versus diversity tradeoff in sparse channels that governs the 
impact of sparsity on reliability and error probability [19]. 
Another interesting aspect to study is the impact of feedback 
on achievable rates [20],[21]. Finally, we note that sparse chan- 
nel models arise in other scenarios as well, such as underwater 
acoustic channels (see e.g., [22]). Thus the implications of this 
work may be applicable in such situations as well. 

Appendix 

A. Proof of Proposition [7J 

As is well known, the coherent capacity expression can be 
computed in closed-form using standard integral formulas. For 
this, we use the following fact [23, 4.337(1), pp. 574]: 



I 

log„(a + x)e~ bx dx = - 
b 



loge(a) 



*dt 



ab 



t 



(66) 
by a trans- 



Particularizing (|66]l to E log 2 (l + SNR|/i| 
formation of random variables of the form ~Re(h) = 
r cos(#), lm(h) = rsm(9) results in 



C coh (SNR) 



£ SNR 



-df. 



(67) 



We can then bound C co ,,(SNR) using [24, 5.1.20, pp.229] as 



- log e (1 + 2SNR) < c« / — < log e (1 



SNR) 



The upper bound of the proposition follows from a combina- 
tion of Jensen's inequality and the monotonicity of log e (l + 
x) — x + under the imposed constraints on b. The lower 
bound follows via a Taylor's series truncation. The tightness 
of the lower bound at low SNR follows from the asymptotic 
(in gjq^) expansion of the exponential integral [24, 5.1.51, pp. 
231]. ■ 



B. Proof of Lemma [7J 

We begin with the vectorized system equation for the 
communication component of the scheme (described in d22l) ) 



Hx 



Hx + Ax + w. 



(69) 



Here, we have represented the (N c — l)£>-dimensional commu- 
nication sub-channel of the diagonal channel in ( TTOb by H for 
simplicity. H is the (N c — l)D-dimensional diagonal matrix 
of channel estimates and A is the estimation error matrix, 
A = H — H. Lumping the estimation error along with the 
additive noise and optimizing over the set of input covariance 
matrices Q that satisfy Tr (Q) = (1 — rf) TP, a lower bound 
to I tr is achieved [25] as follows: 



E 



I tr > SUp • 

Q 



log 2 det (I + HQH ff (I + S 



Ax J 



N C D 



(70) 



where I denotes the (N c — 1)D dimensional identity ma- 
trix. We use a zero-mean Gaussian input with covariance 
matrix Q = p^fjjjtp l- With this choice, note that E^x = 



E H ,x [Axx-A-] = E H [AQA«] = ^ • j£% 
since hi are identically distributed. Thus, we have 



> 



(a) 



1 



N C D 

'Nr. 



•E [log 2 det (l + /3HH ff 

- 1 



N C D 
1 



^2 ' 

D 



log 2 \l + [3 



E 



log 2 [1 + /3 



(71) 



where (3 is as in (f28b and (a) follows because the random 
variables {hi} are i.i.d. Furthermore, it can be shown that 
the hi's are zero-mean with E[|/ii| 2 ] = E[|/i| 2 ] = a 2 as in 
( |29l . We now compute the expectation in ( TtTT > in closed-form. 
For this, we use (|66]l [23, 4.337(1), pp. 574]. Particularizing 



to E 



log 2 [l+f3 



by a transformation of random 



variables of the form Ke(h) — rcos(9),Im(h) = rsin(6>) 
results in 



Itr 



> 



1 



1 

~N C 



log 2 (e) • ei>° 



t 



(72) 



While 03 provid es a closed-form lower bound for Itr-, we 
need a more tractable estimate for the same. For this, we 
use d68l ) [24, 5.1.20,pp.229]. Thus I tr can be further lower 
bounded as 



Itr > Itr = ^ l0 Se (l + ^) 



(73) 



(68) This completes the proof of the lemma. 



C. Proof of Lemma [2] 

Since log(-) is a monotonically increasing function, the 
tightest lower bound to It,- is obtained by maximizing K(rj). 
A tedious, but straightforward, computation shows that for any 
a, b > 0, the function f(rj, a, b) defined on r\ 6 [0, 1] as 

- i]) 



(74) 



a + 6(1 - 2rf + r}N c ) 

is concave as a function of 77. Now note that K{rf) = 
Nl SNR 2 f(t), N c - 1, iV c SNR). Thus K(rj) is maximized by 
setting its first derivative to zero. 

It is easy to check that the r\ that is sought is a root of the 
quadratic 

r/ 2 (JV c SNR(Ar c - 2)) + 277 (iV c SNR + (N c - 1)) 
- {N C SNR+(N C - 1)) = 



and is precisely rf as in ( 130b . Using this value of 77* yields the 
optimal K* as in (T3TT >. Thus the lemma has been established. 



D. Proof of Theorem [7J 



Substituting iV c = Sf ^ R „ in d3~TT l, we have 

K* = K X K 2 , K x 



SNR M (SNR+1-SNR M ) 
(l-2SNRf') ; " 

2 



SNR 1 -^ (1-2SNR") 



1 



SNR+1-SNR^ 1 



- 1 



(75) 
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We study the low SNR asymptotics of K for the following 
four cases - Case 1:^=1, Case 2: fi £ (1, 3), Case 3: fi > 3 
and Case 4: fi < 1. 

Case 1: It is not difficult to check that 



K x = SNR + C(SNR 2 ) 
2 + C(SNR) + C(SNR 2 ) - 



O(l) 



Using the above relationships in ( f32b . we see that the coef- 
ficient of the SNR-term in the low SNR expansion of I tr is 
strictly smaller than log 2 (e). Thus, first-order optimality fails. 
Case 2: When /i <G (1, 3), we have 

Kx = SNR*£, ={Oil} £^0(SNR^) 



(76) 



K, 



l 



1 + 2 SNR" 



SNR 



2 SNR^ 



SNR : 



±) 2 SNR 2 "- 



(77) 



1 3/^ — 1 

which implies that one of the SNR", SNR = , SNR = , 
SNR 2p 1 terms in if leads to failure of second-order opti- 
mality condition. In particular, the coefficient of the SNR 1+e 
term in d34l ) is obtained from the coefficient of the SNR 2 
term within the parenthesis in (1771 1. However, we get exact 
first-order optimality in this case. 

Case 3: When \i > 3, K\ and K 2 are given by (|76] | and 
d77l >. respectively and every vanishing term is of the form 
SNR or SNR" for some v > 2. When fi = 3, we note that 
the contribution to the coefficient of the SNR 2 term can be 
obtained from (1761 1. ( TTTb and equals —3. When fj, > 3, it is 
easy to see that we get exact second-order optimality. Thus a 
low SNR expansion of It r in the form we seek is achievable. 

Case 4: When fi < 1, ifi is given by the same relationship 
as in d761 l. But for K 2 we have 

K 2 = (j SNR 1 "" ££ o £~ o (SNR i+ ^)) 2 . (78) 

This results in the failure of the first-order optimality condition 
since the largest power of SNR in the Taylor's series expansion 
of I tr is SNR 2 "". ■ 

E. Proof of Theorem \2\ 

We follow the same technique as in Theorem Q] We rewrite 
the expression for 77* in (f30b (using N c = st ^ Rfi ) as 



V = 

m = 



SNR^VSNR+l-SNR^') 

mm, vi = (1-2SNR") 



SNR 1 



(1-2SNR^ ) 



SNR+1-SNR" 



(79) 



To characterize the behavior of MSE = 



l+E t 



we analyze 



E tr = r]*N c SNR = SNR 1- "?^. We consider the asymp- 
totics in either of the following two scenarios: (i) fixed /1 and 
SNR — > (as would be the case if we increase W and scale 
T appropriately, according to (fT8l ) (ii) fixed low SNR (<C 1) 
and increasing /1 (for large but fixed W and increasing T). 
The analysis is done over the following three cases: Case 1: 
/i < 1, Case 2: /i = 1 and Case 3: /i > 1. 



Case 1: When fj, < 1, we have 

m = SNR"- 1 E 4={01} ^I oo O(SNR l+ ^) (80) 

(81) 



V2 



SNR 1 "" ^^0(SNR l+J ") 

j=0 j=0 



This leads to 

v* = l 2 +EZoET=2 0(sm^) 

Etr = iSNR 1 -" + £~ 1 £ j l 1 0(SNR i+J ") (82) 

which implies that 77* — > i and MSE — > 1 (since B tr — > 0). 
Case 2: When fi = 1 

//i = l + O (SNR) 

»72 = 



2 + (SNR) +C (SNR 2 ) - 1. 



(83) 



The above relationships imply that 77* — > 0.414 and MSE — > 
0.707. 

Case 3: For /i > 1, 771 is the same as in (f80b but the 
asymptotic expansion for 772 is 



V2 



1 — u 

SNR— 



l + o(l). 



(84) 



It is easy to see in this case that 77* — > 0. Similarly it follows 
that Etr — * 00 an d so MSE — > 0. Furthermore, the rates of 
convergence in this case can be obtained using (f80b and ( l84l ) 
and is as illustrated in 
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